perm filename HTSWTS.MRC[UP,DOC]13 blob sn#749740 filedate 1984-04-11 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00006 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00002 00002	.DEVICE XGP
C00003 00003	←%3How to start WAITS
C00007 00004	%2FIXING THE SYSTEM:%1
C00012 00005	%2RELOADING WAITS:%1
C00032 00006	%2RESTARTING THE KA-10:%1
C00035 ENDMK
C⊗;
.DEVICE XGP
.!XGPCOMMANDS←"/PMAR=0";
.!XGPLFTMAR←216;
.PAGE FRAME 999 HIGH 80 WIDE
.AREA TEXT LINES 1 TO 999 CHARS 1 TO 80
.PLACE TEXT
.FONT 1 "BASL30";
.FONT 2 "BASI30";
.FONT 3 "BUCK75";
.FONT 4 "FIX25";
.FONT 5 "FIX13X";
.FONT 6 "NGR20";
.TURN ON "←{%α↓_#"
.AT "ffi" ⊂ IF THISFONT ≤ 2 THEN "≠"  ELSE "fαfαi" ⊃;
.AT "ffl" ⊂ IF THISFONT ≤ 2 THEN "α∞" ELSE "fαfαl" ⊃;
.AT "ff"  ⊂ IF THISFONT ≤ 2 THEN "≥"  ELSE "fαf" ⊃;
.AT "fi"  ⊂ IF THISFONT ≤ 2 THEN "α≡" ELSE "fαi" ⊃;
.AT "fl"  ⊂ IF THISFONT ≤ 2 THEN "∨"  ELSE "fαl" ⊃;
←%3How to start WAITS

%2FIND A WIZARD:%1

.BEGIN INDENT 5

Before you do anything, you should try to find a wizard.  Maybe there is
already one working on the problem -- if so, he will be very angry if you
disturb the machine.  Check in the 030 block of offices for ME in 030d;
or, for network problems look for Joe Weening in 353 (near lounge); or, as
a last resort, try Len Bosack's office (7-0445, room 040D).  If necessary,
call a wizard at home (but not in the middle of the night unless it is an
%2urgent%* problem, as defined below).

Phone numbers of wizards:
.BEGIN NOFILL; INDENT 0;

%4
Martin Frost (ME) (9)329-9081 or (9)325-8507 (either phone: 11am-11pm)
	           or 21-192 (beeper: 9am-1am, or any time if urgent)
Joe Weening (JJW) (network problems) 497-1517 (MJH 353)
Len Bosack (LB) (number is posted on Score, call only if urgent)
.END
.BEGIN INDENT 0;
(%1Do %2not%* use 7-4975 to call out on, since that is the number that a
wizard might be trying to call in on.  If it rings, %2answer it%*.)

%1For an %2urgent%* problem you can beep ME at %2any%* time, %2but it better
really be urgent!%*  A problem is %2urgent%* if it is continuing or
repeating, e.g., you can't reload, or a similar emergency.  %2A#plain
system crash doesn't count as urgent%*, unless you are unable to reload
after following the reloading instructions below.  Even wizards don't like
being awakened in the middle of the night.

To call by beeper, dial the 21-xxx number, wait for the beeping to stop,
and then %2describe the problem in 10 seconds or less%*.  Your message is
transmitted right then by radio to the beepee.
.END

Sometimes a wizard will dial up the CTY to fix things from home (possibly
without your knowing it).  When this happens, he may need some local help
from you.  %2Stand by%1 in case he asks you to do something like check
memory lights.

If you can't get in touch with a wizard, you'll have to fix it yourself;
see the instructions below.  After you fix it, make a note in the log with
the date, time and description of the failure (include any message typed
out on the CTY).  %2Sign your log note (with your SAIL programmer name, if
any)%1.  Use the observed log format when making your entry.  Thanks.
.END

**********************************************************************
.SKIP 1
%2FIXING THE SYSTEM:%1

.BEGIN
⊗#Many crashes are bug traps and will print a message %2followed%* by:

←%4Find a WIZARD or type "$P".  $ means ESC.  You're in DDT.%1

If it prints this and you can't find a wizard, try typing [ESC] %4P%1 and
a couple of [RETURN]s.  If you get monitor dots, you're in luck.  Type
%4BEEP%1 and [RETURN] to tell everybody the good news.  %2Don't forget to
log the crash!%1

If after you type %4$P%1 the same thing happens, try %4$P%1 again.  If it
happens repeatedly, you'll have to reload, so go to step 1 below.  Certain
errors, like %4Page Fail, PI in Progress%1, require a wizard's
intervention; without help, the routine for such an error will just retry
the losing instruction, which naturally will fail in the same way again.
Routines for some other errors are able to fix the problem or bypass it and get
the system running again automatically when you type %4$P%1.  So the thing
to do is to try %4$P%1 a few times (if once doesn't fix things) before you
give up and reload (but always try to find a wizard before typing %4$P%1
even once).

⊗#If the system gets a %4NXM%1 (non-existent memory error), you may have to
reset a hung memory (reloading won't work); see step 200 below for how to do that.
Sometimes even resetting the memory won't help; in that case the memories
may have to be reconfigured or fixed.  You should leave that for a wizard
to do.

⊗#If the machine has powered itself off, then the %4FAULT%1 light will be on
on the KL-10's console PDP-11 front panel (where it says "KL-10", that's
really a PDP-11).  Usually this indicates an air-flow problem in the cpu
or a tripped circuit breaker.  The cause of the fault will be indicated by
one (or more) of several indicator lights inside the back of the console
PDP-11 cabinet, at the bottom.  Before doing anything else, you should see
which indicator lights are on back there.  Usually it is %4AIR FLOW CPU%1
or %4CKT BKR TRIP%1.  Log the problem before continuing.  Then try
very hard to find a wizard.  Do %2NOT%1 power the system back on unless a
wizard tells you to do so!

⊗#If one of the messages %4?10 CLKOP%1 or %4?10 TTI%1 was printed on the
CTY, a memory may be hung or powered off, or the microcode
may be hung; try the command %4MC%* and [RETURN] to see if that helps.  If
not, you may have to reset a memory (see step 200 below) and/or reload
(but in any case, first seek a wizard!).

⊗#The message %4?10 CMD ERR%1 usually means that %4KLDCP%* is not working;
see step 120 below for what to do about that.

⊗#If no message was printed, or a message was printed which doesn't look like any
of those above, you will probably have to reload (if you can't find a
wizard).

⊗#If explicit instructions are given in an error message, follow them.
.END

**********************************************************************
.SKIP 1
%2RELOADING WAITS:%1

.BEGIN INDENT 5
1.  If there has been a power failure, go to step 105.

2.  Type %4↑X%1 (i.e., hold down %4CTRL%1 and type %4X%1).  The response
should be %4KLDCP%1 (or else it may echo simply as %4↑X%*).  If the
command typed in the next step doesn't echo, try this step again; then if
typing the next step's command still doesn't seem to work, go to step 100.

3.  Type %4SP%* and [RETURN].  This stops the KL-10 and records useful
information, including the PC, on the CTY for later perusal by a wizard.
If this command gets you the message %4?UCODE HUNG%*, then type the
command %4ALL%* and [RETURN]; this logs a few lines of information
so a wizard can figure out how the microcode was hung.  In either case,
go on to step 4 next.

4.  Type %4DS%1 and [RETURN].  If %4DS%1 gives you the proper response of
%4DSKDMP%1 and a star (%4*%1), go to step 5.  If you get the message
%4LOAD DSKDMP - USE LD%1, then you'll have to load the DSKDMP bootstrapper
from DECtape into the PDP-11 by typing the command %4LD BOOT%1 and
[RETURN] %6(if for some reason you are trying to reload from the Ampex
disks instead of the DEC RP07s, then you'll have to have selected/mounted
a different DECtape and the command to use here is %4LD NBOOT1%1)%*.
After doing %4LD BOOT%1, start step 4 over again.  If you get the message
%4DEX ERROR IN DS%1, then perhaps there is a hung memory which needs to be
reset; check the memories and reset any hung one(s) according to step 200,
and then return to the beginning of step 4.  If you've tried all of the
appropriate suggestions in this step and %4DS%* still fails, go to step
105.  If you still get %4DEX ERROR%1 after starting over at step 105, then
there is probably a failing memory and you'll have to get help from a
wizard.

5.  Type %4WAITS%1 and [RETURN].  (%6In certain rare cases, when the
PDP-11 realtime clock isn't working to supply WAITS with the date and
time, the system may ask you for the current date and time; if so, please
be careful to enter them correctly.)%* If the system reloads and starts
you are winning.  If the system doesn't start, you must get help.  %6If
(and only if!) the CTY says %4?10 CMD ERROR%1 at this point, then you may
have to perform step 100c to reload KLDCP.%*  In any case, %2don't forget
to log the cause of the crash and the reload!%1

.END

.BEGIN INDENT 0
%2Don't come here unless directed to by the steps above.%1
.END

.BEGIN INDENT 5
100.  KLDCP is the PDP-11 console program.  It prompts with "%4>.%1"
(a greater-than sign and a dot).  By typing carriage return, you should be
able to get another such prompt.  If so, KLDCP is running; go to step 1.
If you don't get the KLDCP prompt, continue here with 100a.

100a. Try restarting KLDCP: set 100014 in the PDP-11 switches (switches
15, 3 and 2 up, all the rest down); push HALT/ENABLE down and then back up;
push LOAD ADDRESS down and back up; press START.  KLDCP should
respond with a prompt; if so, go to step 1, else 100b.

100b. Try restarting KLDCP again, this time with 100004 in the address
switches (bits 15 and 2 up).  If you get the KLDCP prompt, go to step 1;
otherwise try one more starting address, namely 100010 (bits 15 and 3 up).
If this finally works, go to step 1, else go to 100c.

100c. If restarting KLDCP fails, KLDCP must be reloaded from DECtape.
Make sure a DECtape labelled "KL10 bootstrap" is mounted on a PDP-11
DECtape drive that is selected to unit 0 and is enabled for "remote"
(i.e., computer) operation.  Press the "LOAD DECTAPE" button (located
above and to the left of the red "Emergency Power Off" button) and hold it
for at least a slow count to one.  The DECtape should spin and eventually
something like %4TCDP monitor%1 should be typed.  Type in %4KLDCP%1 and
[RETURN].  KLDCP should load and type a message like %4Stanford KLDCP -
QMP/EN%1.  If you don't get to TCDP you might try pressing the LOAD
DECTAPE button again.  If you get to TCDP and the %4KLDCP%1 command
doesn't work, get help.



.BEGIN INDENT 0
%2Start here after running any diagnostics or after the power has been off
for the KL-10.  Otherwise, don't come here unless directed to by the steps
above.%1
.END

105.  Reloading the KL10's microcode and configuring the memory.  This is
done by running a bootstrap sequence from the SAIL KLAD pack on the RP06
disk.  The KLAD pack should already be mounted, ready, and write enabled
on the RP06 disk drive.  Here's what to do to reload the microcode and
configure memory:

.BEGIN PREFACE 0; INDENT 0,7; SKIP;

⊗#Set the PDP-11 switches to zero.

⊗#Push the black button labelled (LOAD) DISK.

⊗#A program (RSX20F) will be loaded into the PDP-11 and
started.  It will type many things (it takes about a minute: be
patient).  It will finally say: %4KLI#--#CONFIGURATION FILE WRITTEN%*, at
which point the microcode has been loaded and the memory configured,
and you can go on.

⊗#Perform step 100c (to reload our KLDCP from DECtape) and then continue here.

⊗#Now type %4LD BOOT%* and [RETURN] (or LD NBOOT1 if reloading from the Ampex
disks for some reason)

⊗#Type %4DS%* and [RETURN].  DSKDMP should give its usual response of
%4DSKDMP%* and a star (%4*%1).  (%6If you instead get %4DEX ERROR
IN DS%*, then give the command %4EM 20%* and [RETURN] a couple of times.
If that types out a value from location 20, then now try %4DS%* and
[RETURN] again.  If it works, continue; if not, find help.%1)

⊗#Go to step 5 to finish reloading.
.END


.BEGIN INDENT 0
%2Don't come here unless directed to by the steps above.%1
.END

120.  Reloading KLDCP with the system already running, e.g., after step 5
is successful.  Sometimes the console-11's program, KLDCP, gets clobbered
and fails to work; this may be manifested by the failure of all attempts
from WAITS to reach other hosts via the Ethernet (since the PDP-11
contains the interface to the Ethernet).  Or you may see repeated messages
on the CTY saying %4?10 CMD ERR%1.  If KLDCP seems not to be working, you
can reload it while WAITS is running.  If the CTY is working (i.e.,
the console-11 is somewhat happy), then you can just type %411LOAD%* on
the CTY.  Normally the CTY is not usable with the system if KLDCP is not
happy, so a suitably privileged user must log in and incant either:

.BEGIN SELECT 4; no fill; no just; SKIP;
	11LOAD

%1or%*

	RUN 11LOAD[KL,SYS]
	AGRONK
	KLDCP.L11[KL,SYS]
.END

.END

.SKIP
%2Don't come here unless explicitly directed to by above instructions.%*

.BEGIN INDENT 5;
200. Resetting hung memories.  If some error condition such as %4DEX ERROR
IN DS%1 or %4NXM%1 indicates that there is probably a hung memory, then
the memory needs to be reset.  There are two types of memories: the MG
memory (in two identical cabinets labelled, in the upper left corners, MGB
and MGA) and the ARM-10M memory (in one cabinet to the right of the two MG
boxes).  The three memory cabinets are located in the row behind the
KL-10.  Before resetting a memory, you should attempt to see if it is
hung.  This is done differently for the two different types of memory.
(If one or more of the three memory boxes has %2no lights on%*, then that
box has probably turned itself off -- in that case, find a wizard rather
than trying to fix it yourself!)

201. MGA and MGB: Each of these cabinets has an array of lights at the
top.  The bottom two rows in this array indicate the status of the two
controllers (cont 0 and cont 1) within each cabinet.  So there are four
controllers to check for being hung (or for having parity errors).  On
each controller's row of status lights, there is at the left end a light
labelled %4UA%1 (for Unit Available); if this light is out, the controller
is hung.  On the right end of each row of status lights is a light
labelled %4PAR ERR%1; if this light is on, then that controller has seen a
parity error.  If you notice a parity error, you should record it and also
record which of the %4RD%1 (read) and %4WR%1 (write) lights is on at the
other end of that row.  If you find a hung MG controller, you should do
the following to reset it:  (a) first push the RESET button on the front
of the C1 Disk Channel (next to the KL-10), and (b) then push the RESET
button on the bottom front of the MG that was hung (in each case, you must
open the magnetic door to get at the RESET button).  DO NOT RESET THE
MEMORY SIMPLY BECAUSE YOU FIND A PARITY ERROR LIGHT ON!  The parity error
light is simply a flag and does not affect memory operation.

202. ARM-10M: This memory has four HUNG lights at the bottom of the main
array of lights (visible through the window).  The four HUNG lights are
spread out, one for each sector, and each one is next to a RESET switch.
(You may not notice the lights if none is on, because of the dark
background, but you should see the word HUNG above a blank space where the
light really is, next to a RESET switch.)  The ARM-10M also has parity
error lights, in a line of four, one for each sector, labelled SECTOR
PARITY ERROR.  And just above those lights are four others labelled SECTOR
CONTROL ERROR.  Before resetting a hung memory, you should note whether
any of the parity error or control error lights are on; if any are on,
record in the log which one(s) they are.  To reset a hung sector, push the
RESET button next to the HUNG light that is on (you do NOT need to reset
the C1 before resetting the ARM-10M).  Again, NEVER RESET A MEMORY JUST
BECAUSE IT HAS A PARITY ERROR LIGHT ON!  You only need reset a memory if
it is actually hung.  If the ARM-10M is hung, you will end up having to
restart the KA-10 (after you get the system running again).

.END
**********************************************************************
.SKIP 1
%2RESTARTING THE KA-10:%1

.BEGIN INDENT 5
These instructions are for restarting the KA-10, which is the secondary
processor (P2).  They assume that the main timesharing system itself is
running; presumably you are reading this because you were told to restart
the KA-10 by an XGP spooling of yours or by WAITS when you reloaded.  As
always, make sure no wizard is already working on it.

The KA-10 is the black computer (the KL-10 is blue).  Its console
panel should be about four feet to the left of this sheet.  It has
a lot of lights and switches on it.

To %2restart%1 the KA, first check the KA's address switches to be sure
that they are set to 204.  If you don't know how to do this, don't worry
since its switches should always be set to 204.  In addition, if the KA
stopped with a memory stop make a log entry with details of the memory
lossage; the note on the KA console says how to do this.

Now, restart the KA by first pressing the RESET button on the KA-10
console panel and then pressing the START button.  You should get a
message on the KA-10 CTY (a Teletype behind the KA) saying something like
%4KA10 RESTARTED%1.

If you don't get that message, or if it is followed by some other message
that looks like an error message, try reloading the KA-10 (%2not%1 the
regular system!!).

To %2reload%1 the KA, press the KA-10 RESET button again.  Then go to the
KL-10 CTY and type %4P2LOAD%1 and [RETURN].  When it finishes, press the
START button on the KA-10 console panel; the KA-10 CTY (behind the KA)
should say %4KA10 RELOADED...%*.  If this doesn't work now, get help.
.END

**********************************************************************
.SKIP 1
The PUB source for this file is %4HTSWTS.MRC[UP,DOC]%1.  Corrections
marked on this sheet will be noted therein.